-
Notifications
You must be signed in to change notification settings - Fork 7.9k
Use bulk conversion in BCMath of BCD/CHAR where possible #14103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
amazing!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ngl, I don't really understand what the code is doing :/ I trust it to do what it says, but I don't really understand the logic
ext/bcmath/libbcmath/src/convert.c
Outdated
#define SWAR_ONES (~((size_t) 0) / 0xFF) | ||
#define SWAR_REPEAT(x) (SWAR_ONES * (x)) | ||
|
||
static char *bc_copy_and_shift_numbers(char *dest, const char *source, const char *source_end, unsigned char shift, bool add) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possible add a restrict
keyword for dest
and source
? As those APIs are only used within other C files, so we don't need to have compatibility with C++?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I'll add the restrict keyword.
I'll explain what the code does.
The intention is to copy each byte from source to dest, but subtract or add '0' to each byte.
The idea of this patch is to try to read+write 8 bytes at once, adding/subtracting '0' to each byte also in parallel.
SWAR_ONES will be of the form 0x01010101 for 32-bit or 0x0101010101010101 for 64-bit.
Example: SWAR_ONES * 0xAB will therefore be equal to 0xABABABAB for 32-bit or 0xABABABABABABABAB for 64-bit.
So in this case, for SWAR_REPEAT('0'), it will be a 32/64-bit word where each byte is equal to '0', i.e. 0x303030...
Since we know that subtract/add overflow from one byte to another can't occur, we can subtract/add with 0x303030... to the entire 4/8 bytes which will be equivalent to adding 0x30 to each byte individually.
And to be complete: SWAR stands for "SIMD Within A Register"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahhh okay, well could you please write this in a comment in the file? :D
On my i7-4790 with benchmark from #14076, on top of #14101 I obtain the following results:
before (with #14101):
after (with #14101 + this):